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Abstract 


The hydrological design of several hydraulic works or the revision of the 
constructed ones is based on the design floods, which are maximum flows 
of the river, associated with low probabilities of exceedance or predictions. 


Its most reliable estimate is made through frequency analysis, statistical 
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process that consists of representing the record of maximum annual 
flows, with a probability distribution function (PDF) or probabilistic model, 
used to make the desired predictions. In this contrast study, the beta- 
kappa and beta-Pareto FDPs are proposed, and the following three were 
considered to be widely used FDPs: Log-Pearson type III, general extreme 
values, and generalized logistics. Therefore, it is exposed, for the first two 
FDP, a summary of his theory and his method of fit for maximum 
likelinood is presented. Eleven annual extreme hydrological data records 
are processed and the fits are contrasted with two indices: The standard 
error of fit and the mean absolute error. The selection of the predictions 
in the seven return periods (Tr) studied was based on the lower values of 
the fit errors and on the search for representative predictions in the Tr > 
500 years. The conclusions suggest the inclusion of the beta-kappa and 
beta-Pareto distributions in the frequency analysis due to their versatility 


and fit facility. 


Keywords: Beta-kappa distribution, beta-Pareto distribution, maximum 
likelihood fit, standard error of fit, mean absolute error, Q-Q graphics, 


predictions. 


Resumen 


El diseño hidrológico de varias obras hidráulicas o la revisión de las 
construidas se basa en las crecientes de diseño, que son gastos máximos 
del río, asociados con bajas probabilidades de excedencia o predicciones. 
Su estimación más confiable se realiza a través del análisis de frecuencias, 
proceso estadístico que consiste en representar el registro de gastos 


máximos anuales con una función de distribución de probabilidades (FDP) 
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o modelo probabilístico, utilizado para realizar las predicciones buscadas. 
En este estudio de contraste se proponen las FDP beta-kappa y beta- 
Pareto, y se consideraron FDP de uso generalizado las tres siguientes: la 
Log-Pearson tipo III, la general de valores extremos y la logística 
generalizada. Por lo anterior, se expone para las dos primeras FDP un 
resumen de su teoría y su método de ajuste por máxima verosimilitud. 
Se procesan 11 registros de datos hidrológicos extremos anuales y se 
contrastan los ajustes con dos índices: el error estándar de ajuste y el 
error absoluto medio. La selección de las predicciones en los siete 
periodos de retorno (Tr) estudiados se basó en los valores menores de los 
errores de ajuste y en la búsqueda de predicciones representativas, en 
los Tr > 500 años. Las conclusiones sugieren la inclusión de las 
distribuciones beta-kappa y beta-Pareto en los análisis de frecuencias 


debido a su versatilidad y facilidad de ajuste. 


Palabras clave: distribución beta-kappa, distribución beta-Pareto, 
ajuste por máxima verosimilitud, error estándar de ajuste, error absoluto 


medio, gráficos Q-Q, predicciones. 
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Introduction 


Stages of frequency analysis 


The hydrological dimensioning of hydraulic works, such as: protection 
dykes, canalizations and bridges, as well as the various urban drainage 
structures, is based on the Design Floods (CD, by its acronym in Spanish). 
The most accurate hydrological estimation of CDs is done through 
Frequency Analysis (AF, by its acronym in Spanish); a statistical 
procedure which consists on interpreting or characterizing the available 
record of hydrological events, for example, floods or maximum rainfall, in 


terms of their future probabilities of occurrence (Bobée 8 Ashkar, 1991). 


AF involves the following five stages: (1) integration and verification 
of the randomness of the record or available sample; (2) selection of the 
probability distribution function (PDF) or probabilistic model that will 
represent the data and allow estimates or predictions associated with low 
probabilities of exceedance; (3) adjustment of the various FDPs tested, 
that is, obtaining their fit parameters with the various available methods; 
(4) evaluation of the statistical quality of the fit achieved between the 
data and the PDF, by means of graphs and diagnostic indices and (5) 
selection of the results (Kite, 1977; Bobée 8: Ashkar, 1991; Rao 8 Hamed, 
2000 ; Meylan, Favre, € Musy, 2012; Stedinger, 2017; Teegavarapu, 
Salas, € Stedinger, 2019). 


In this contrast study, in stage one, four records of peak flow and 
joint volume of annual floods and three records of annual maximum daily 


rainfall were selected. In total, eleven series of extreme hydrological data 
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were processed and their randomness was verified with the Wald- 
Wolfowitz test. In stage two, the objective of the study is addressed by 
selecting the beta-kappa (BEK) and beta-Pareto (BEP) FDPs to contrast 
them against three of general application, which were: the Log-Pearson 
type III (LP3), the General of Extreme Values (GVE) and the Generalized 
Logistics (LOG). All the cited PDFs have three fit parameters. 


In stage three, the BEK and BEP PDFs were fitted using the maximum 
likelinood method proposed by Mielke and Johnson (1974). The LP3 
distribution was fitted with its classical method of moments in the 
logarithmic domain (WRC, 1977) and the GVE and LOG models with the 
method of L moments (Hosking 8 Wallis, 1997). 


For stage four, two indices (£EA and EAM) were calculated, the 
standard error of fit (Kite, 1977; Chai 8 Draxler, 2014) and the mean 
absolute error (Willmott 8 Matsuura, 2005). Finally, for the fhfth stage of 
result selection, the EEA and E£AM values were taken into account, as well 


as the values obtained for the predictions. 


Background on BEK and BEP distributions 


Strupczewski, Markiewicz, Kochanek and Singh (2008) indicate that there 
are very few references on the use of PDF, with two shape parameters, 
to model extreme hydrological events and cite the following two. The 
French statistician Halphen's system of distributions proposed in 1941 has 
a lower bound of zero and two shape parameters, but due to its 


mathematical complexity it was abandoned. On the other hand, the 
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distributions designated by Mielke and Johnson (1974) such as BEK and 


BEP are models with two shape parameters and one scale parameter. 


Wilks (1993), in a pioneering study of the PDF contrast of three fit 
parameters, with maximum precipitation data, processed as annual series 
and with magnitudes greater than a threshold value, found that the BEK 
distribution describes the annual series fairly well and the BEP model is a 


better fit for partial duration series. 


Campos-Aranda (1998) exposed various applications of the BEP 
distribution. Mason, Waylen, Mimmack, Rajaratnam and Harrison (1999) 
use the BEK and BEP PDFs in a change detection study for extreme rainfall 


events. 


Oztekin (2007) contrasts the BEK and BEP models against the 
Wakeby distribution, finding that the latter leads to better or similar fits 
in maximum rainfall records. Murshed, Kim and Park (2011) expose for 
the BEK distribution the estimation of its fit parameters by means of the 
methods of moments and moments L. Finally, Nguyen, El Outayek, Lim 
and Nguyen (2017) include the PDF BEK and BEP in their study of 
maximum annual rainfall contrast, based on the descriptive and predictive 
abilities of the PDF. 


Objectives 


From this contrasting study the objectives were three: (1) present a 
summary of the BEK and BEP distributions theory; (2) describe in detail 


its maximum likelihood fit method, through its iterative equations for the 
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estimation of its three fit parameters and (3) perform a goodness-of-fit 
and prediction contrast between the BEK and BEP distributions and the 


three general application ones (LP3, GVE and LOG). 


Generally speaking, the AF statistical technique has debatable 
assumptions at each of its five stages. From the representativeness in the 
future of the available registry, to the adoption of results, based on 
graphics and diagnostic indexes; going through the selection of several 
PDFs to make the desired predictions. It is stage two of AF, which opens 
possibilities to test new probabilistic models, since as is known, no PDF is 


better and its suitability depends on each record processed. 


Methods and materials 


Equations of the BEK and BEP distributions 


Mielke and Johnson (1974) expose the PDF and the probability density 
function (pdf) of the generalized beta random variable of the second kind 


xy: 


y1 ] ds 0 
O x>0 (1) 
0 


FG()=0 x>0 (2) 
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(o) = G/B py e» sb (3) 
FO= aora * 
f0)=0 x>0 (4) 


in which, a>0,6B>0,0>0andO < y < 8 (a + 1). The numerator of 
Equation (1) is the following Gaussian hypergeometric series 
(Oberhettinger, 1972): 


1 ¿e = yo, Tm+arin+brte)z" 
2F(a, biz) == ormrao a (5) 


and the denominator function is: 


B(8,€) = TM(S)T (€)/T (S + €) (6) 


Mielke and Johnson (1974) indicate that the calculations associated 
with equations (1) and (3) are not simple, but by establishing two 
restrictions on the parameter y,  distributions with important 
computational advantages are obtained. The first constraint is y = a9 and 
leads to the beta-k (BEK) distribution, named this way due to its similarity 
to Mielke's (Mielke, 1973) Kappa distribution, whose PDF and pdf 


equations are: 


FCO =(6/B/(1+ G/B x>0 (7) 
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100 = (0/60/11 > (8) 


witha>0,6B>0and O > O. Bis the scale parameter and a y O the shape 


parameters. The quantile function, designating F(x) = p, is as follows: 


xp) = plp2/(1-pYa (9) 


The second restriction is y = 6, which defines the beta-P (BEP) 
distribution, thus designated for its resemblance to the Pareto-type 


probabilistic model, its PDF and pdf are: 


F()=1-[1+(/8% x>0 (10) 


-(a+1) 


f60 = (49/B/P [1 + (</8)] x>0 (11) 


with a > 0,f£> 0 and 6 > O. Again, £ is the scale parameter and a and O 
are the shape parameters. The quantile function, designating F(X) = p, is 


as follows: 


x(p) = PU -p 2-1] (12) 
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BEK and BEP by maximum likelihood fit 


Mielke and Johnson (1974) describe the procedures for calculating the fit 
parameters (a, 6, 0) with the maximum likelinood method, according to 
three equations of iterative application (7), starting from initial values of 


Bo and Oo. For the BEK distribution such expressions are: 


bat 
a = n(Ef, In [1 + (x:/Bj-1) “EJ (13) 
1 1 n A 
Bj = =(1 E =) Bj Dia [1 + (4/80) 8%] (14) 
Ba Al n [Ca/e yo afec) (15) 
) DN 1+(x/8) 2 


For the BEP distribution its iterative equations are: 


-1 
aj = ní 1 Mn [1 + (2/8) )) (16) 
Bj =Z(1+0j)Bja E 11 + 09/80] (17) 
9, = n n [ep ale)” (18) 
) ISA 1 / pp 
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Contrast distributions and their fitting 


The LP3 distribution was fitted with the classic method of moments in the 
logarithmic domain (WRC, 1977). In contrast, the GVE and LOG 
distributions were fitted based on the L moments method, which has been 
shown to be efficient and robust, even in small samples, according to 
equations set forth by Hosking and Wallis (1997). 


Campos-Aranda (2002) presented six methods of fitting the LP3 


distribution, limiting their applicability based on the estimation ratio (CE), 


defined as: 
X100—X50 
E Ln 


In which, X7, is the prediction of the return period (Tr) expressed in 
years, equal to the reciprocal of the exceedance probability (q), when 
series or samples of maximum annual hydrological events are processed. 
When the CE < 1.00, the best fitting methods are the mixture of moments 
and the moments in the real domain, and when CE > 1.30, the most 
convenient methods are the maximum likelihood and maximum entropy 
methods. When CE fluctuates between the quoted limits, the method of 
moments in the logarithmic domain is acceptable and generally leads to 
a good fit. 


On the other hand, the General distribution of Extreme Values (GVE), 


is derived from the Fisher-Tippet-Gnedenko Theorem, which establishes 
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as the probabilistic model of the maximum values sampled annually the 
GVE and its three particular cases, depending on the value of its shape 
parameter. However, as indicated at the end of the Objectives section, 
other distributions continue to be proposed and used, due to the random 
nature of the extreme hydrological data records, to search for their ¡deal 


distribution, based on the statistical indicators of the fit achieved. 


Q-Q diagnostic graph 


Nguyen et al. (2017) have suggested two evaluations to select the optimal 
PDF to obtain a record of extreme hydrological data: (1) descriptive ability 
and (2) predictive ability. The first refers to the accuracy with which the 
PDF being tested reproduces the sample data and the second is logically 
associated with the variability of its predictions in relation to the 
dispersion of the sample predictions. There are three techniques to test 
descriptive ability: (1) diagnostic charts; (2) statistical tests and (3) 
goodness-of-fit indices (Meylan et al., 2012). 


Empirical versus estimated probability and amount observed versus 
estimated, P-P and Q-Q diagnostic plots have become popular (Coles, 
2001; Wilks, 2011) and provide a simple and effective way to compare 
the results of a contrasted PDF. For a sample of data x; sorted from 
smallest to largest, an empirical probability (p) is assigned to them, for 
example, with the Cunnane formula, which according to Stedinger (2017) 


leads to unbiased values in most of the PDFs used in hydrology, this ¡s: 
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m-0.40 (20) 


P= n+oz20 


in which m is the order number of the data and n its total number. For 
each datum x;, its probability is obtained with the equation of the tested 
PDF. For the case of the BEK and BEP distributions, with expressions (7) 
and (10). The P-P graph is defined with the following abscissa and 


ordinate points: 


ES] forí= 1,2, 0. 1 (21) 


n+0.20 


The Q-Q graph uses equations (9) and (12) or inverse solutions of 
the PDFs BEK and BEP, to define the points of the ordinates and is made 


up of the following points: 


ERES] fort= TZ Rn (22) 


The disadvantage of diagnostic graphs lies in the subjective 
assessment that is made when comparing various PDFs, since a numerical 
value is not available (Nguyen et al., 2017). Campos-Aranda (2019) 
visualizes the Q-Q graph as more useful, to detect overestimated 
predictions (because it is above the 459 line) or underestimated (because 


it is below). 
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Standard Error of Fit (EEA) 


Goodness-of-fit indices have the advantage of being easy to calculate and 
commonly involve the difference between the observed values x; and the 
estimated values £ with the PDF being tested. The EEA is the most 
common (Chai € Draxler, 2014), it was established in the mid-1970s 
(Kite, 1977) and has been applied in Mexico using Weibull's empirical 
formula (Benson, 1962). It will now be applied using Cunnane Equation 
(20). The expression of the EEA ¡s: 


1/2 


x¡ are the n observed data ordered from smallest to largest, *, the 
estimates, for the probability estimated with Equation (20) and the PDF 
that is contrasted; np is the number of fit parameters of the FDP, with 


three for those applied in this study. 


Mean Absolute Error (EAM) 


Its advantages lie in having the units of the variable, just like the EEA, 
and preventing the impact of the scattered values from being squared and 
therefore EEA>2EAM (Willmott € Matsuura, 2005). Its expression is 
(Nguyen et al., 2017): 


Tecnología y ciencias del agua, ISSN 2007-2422, 


Open Access bajo la licencia CC BY-NC-SA 4.0 - á ne 1EL0D- 
(https: //creativecommons.org/licenses/by-nc-sa/4.0/) ro DA A a 


a 0) Check for updates 
OPEN ACCESS 


Tecnología y 


CienciaszAgua 


EAM = 2izl%i-%il (24) 


n—np 


Processed hydrological data records 


Aldama, Ramírez, Aparicio, Mejía-Zermeño and Ortega-Gil (2006) 
indicate the joint records of peak flow rate (Qp) in m3/s and volume (Vo/) 
in millions of m* (Mm3) per year of the inflow floods at 15 important 
reservoirs in Mexico and one in project. Of such joint records, the three 
that are considered complicated in their probabilistic analysis were 
selected, since they include scattered values (outliers) and have large 
differences between their low and maximum values. The first ones 
correspond to the 43 entry data to the El Infiernillo dam on the Balsas 
River, between the states of Michoacán and Guerrero, which has a basin 
area of 108,000 km?. The second records to be processed are the 52 
intake data for the Huites dam on the Fuerte River, in the state of Sinaloa, 
with a basin area of 26,020 km?. The third ones were the 37 entrance 
data to the Guamúchil dam, on the Mocorito river, also in the state of 
Sinaloa and with a basin area of 1,630 km?. 


In addition, Domínguez and Arganis (2012) expose the joint records 
of Qp and Vol with 47 input data to the Malpaso dam, on the Grijalva 
River, in the state of Chiapas, Mexico, with a basin area of 34,800 km?. 
Such flood records were also processed. 


Finally, three records of annual maximum daily precipitation (PMD) 
from a pluviometric station in each geographical area of the state of San 


Luis Potosi, Mexico, were analyzed. From the Altiplano, the Los Filtros 


Tecnología y ciencias del agua, ISSN 2007-2422, 


Open Access bajo la licencia CC BY-NC-SA 4.0 - á ne 1EL0D- 
(https: //creativecommons.org/licenses/by-nc-sa/4.0/) ro DA A a 


a 0) Check for updates 
OPEN ACCESS 


Tecnología y 

CienciaszAgua 
station (n = 66), located in the city of San Luis Potosi, was processed; 
from the Middle Zone, the one located in the city of Río Verde (n = 52) 
and from the Huasteca region, the one from the town of Tanquian de 
Escobedo (n = 52). The altitudes of the pluviometric stations cited are: 
1904, 987 and 87 meters above sea level. These records were analyzed 
by Campos-Aranda (2019) to obtain their optimal PDFs and integrated 
based on the CONAGUA monthly Excel file, provided to the author; 


therefore, they are reproduced in Table 1. 


Table 1. Records of annual PMD (millimeters) in the three pluviometric 


stations indicated in the state of San Luis Potosí, Mexico. 


A Los Filtros Rio Verde Tanquian Ñ 
O. O. 
(1949-2014) (1961-2013*) | (1961-2014**) 


1 15.9 66.5 46.4 28.2 107.0 120.0 34 


2 20.6 26.0 522 61.8 ts 62.0 35 


3 50.9 Ej 33.4 37.8 90.3 64.0 36 


4 40.5 46.5 27.0 55.6 171.0 132.0 37 


5 63.6 44.0 39.2 uz 81.5 80.0 38 


6 41.9 41.0 79.0 44.0 176.0 72.0 39 


? 60.0 55.0 43.1 74.1 176.5 82.0 40 


8 39.9 215 40.5 51,7 TL: 88.0 41 


9 48.6 29.8 57.7 34.0 78.0 185.0 42 


10 63.0 41.5 63.7 41.5 109.0 105.0 43 
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N Los Filtros Rio Verde Tanquian Ñ 
O. O. 
(1949-2014) (1961-2013*) | (1961-2014**) 


yl 35.5 25.4 81.5 43.5 84.0 67.0 44 


12 40.0 59.0 52.0) SAS 87.0 201.0 45 


13 63.2 33.5 48.5 126.3 111.0 89.0 46 


14 39.4 46.5 51.3 58.5 99,5 200.9 47 


15 272 51.0 117.5 55 1 166.5 90.0 48 


16 59.0 40.0 5/09 99,1 113,5 110.0 49 


17 32.0 310 61.8 81.4 148.0 68.0 50 


18 30.0 45.5 83.4 31.8 117.0 89.0 al 


19 40.2 23.9 71.7 63.2 85.0 190.0 52 


20 31:5 20.7 35.0 , 73.0 a 53 
21 31.5 37.5 87.0 5 103.0 A 54 
zz 52.0 40.2 86.3 E 790 A 55 
23 52.3 111.0 37.1 A 81.5 A 56 
24 3.3 43.3 30.2 . 113.5 . 57 
Za 35.0 76.9 87.1 , 94.0 A 58 
26 28.5 42.8 46.7 S 54.0 _ 59 
2l 57.2 46.1 19.3 E 86.0 s 60 
28 58.0 42.5 51.0 A 98.0 + 61 
29 42.9 45.3 61.0 A 63.0 - 62 
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N Los Filtros Rio Verde Tanquian Ñ 
O. O. 
(1949-2014) (1961-2013*) | (1961-2014**) 
30 26.4 44.5 92.3 - 200.8 - 63 
31 65.5 26.0 97.4 - 106.0 - 64 
32 22.0 59.1 38.7 - 370.0 - 65 
33 Bilz 44.1 43.9 - 84.0 - 66 


* one year missing. 


** two years missing. 


Wald-Wolfowitz test 


This nonparametric test has been used by Bobée and Ashkar (1991), Rao 
and Hamed (2000), and Meylan et al. (2012) to test independence and 
stationarity in records of maximum annual flows (X;). For this reason, it 
was proposed to apply the test to the annual Qp, Vol and PMD records, 
which must be samples of random values. A. Wald and J. Wolfowitz based 
on the work of R. L. Anderson on the serial correlation coefficient 


developed such test, whose statistic ¡s: 


R = MU A (25) 


When the size (n) of the series or sample (x;) is not small and its 
data are independent, R comes from a Normal distribution with mean and 


variance, given by the following expressions: 
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E[R]=R= sos (26) 
Si=Sa | SÍ-4S7S2+4:S1S3+SÍ=2S4 2 
in which: 
IN (28) 
Finally, U is calculated, with the equation: 
_ RR 
U= Al (29) 


The value of U follows a Normal distribution (0.1) and can be used 
to test the independence of the series data with a level of significance a, 
commonly 5 %. In a two-tailed test, the standardized normal variable is 
Za/2 = 1.96; then, when the absolute value of U is less than 1.96, the 


series will be made up of independent values (random sample). 
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Results and their discussion 


Test of randomness and ratios of L moments 


In the third column of Table 2, the values of the U statistic (Equation (29)) 
are shown, defining that the 11 records processed are random. The rest 
of the columns show the magnitudes of the arithmetic mean and the 


quotients of moments L of each record. 
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Table 2. General data and ratios of L moments of the 11 records 


processed. 
No. Record in: U X tz ta 
1 El Infiernillo Qp -0.707 | 5499.512 | 0.49875 | 0.36082 
2 El Infiernillo Vo/ -0.022 | 2244.791 | 0.37866 0.16834 
3 Huites Qp -0.090 | 3305.135 | 0.49313 | 0.30438 
4 Huites Vol -0.602 | 841.769 | 0.30773 | 0.17585 
5 Guamuchil Qp -1.418 | 1610.854 | 0.70597 | 0.51098 
6 Guamuchil Vo/ 1.043 38.747 0.57062 0.28975 
Z Malpaso Qp -0.666 | 2153.234 | 0.40782 | 0.25020 
8 Malpaso Vol 0.558 1583.168 | 0.29264 0.13777 
9 Los Filtros PMD -0.616 43.005 0.13516 | 0.16115 
10 Rio Verde PMD 0.179 58.442 0.20926 0.09728 
11 Tanquian PMD -0.746 112.086 0.36652 0.22984 


U =statistical of the Wald-Wolfowitz Test. 


X = arithmetic mean, in m3/s, Mm? or millimeters. 


tz = quotient of moments L of asymmetry. 


ta = quotient of moments L of kurtosis. 


The original records of the Guamúchil hydrometric station exposed 
by Aldama et a/. (2006) finally include four extreme maximum Qp and Vol 
values, obtained at the Eustaquio Buelna dam with inverse operation of 


its flood transit, which originate that such records are not random (U = 
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4.156 and U = 2.597). Eliminating these four Qp and Vol data, the U 


values shown in Table 2 are obtained. 


Fitting of the BEK distribution 


Taking into account that the fit parameters of the beta distributions are 
obtained by successive substitutions (j) or iterations, the models 
considered to be in general use were fitted first; that is, the LP3, GVE and 
LOG distributions to each of the 11 records processed, to give them that 


meaning. 


With the above, we have the fit errors (EEA and EAM) and the 
predictions related to the seven return periods analyzed (25, 50, 100, 
500, 1000, 5000 and 10000 years), obtained with the three mentioned 
distributions. The fit error values were used as magnitudes not to be 
exceeded and, in the case of predictions, in return periods of less than 
100 years, as values to match, with the fit by iterations of the beta 


distributions. 


The fit parameters of the BEK distribution were estimated based on 
equations (13) to (15), using as initial values of Bo and Oo the arithmetic 
mean (Table 2) and a value of 5.0, respectively. In each iteration (7), the 
standard errors of fit (Equation (23)) and absolute mean (Equation (24)) 
were evaluated, using formulas (9) and (20), to obtain the estimated 
values £;. The comparison between errors obtained with the BEK and its 


predictions allowed defining the number of iterations carried out, which 
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are shown in Table 3, as well as the fit parameters obtained. The 


maximum number of ¡terations was limited to 500. 


Table 3. Number of iterations (7) and values of the fit parameters 
(6,a,0) of the BEK distribution in the 11 records processed. 


No. Record in: j B a (2) 

1 El Infiernillo Qp 274 | 2619.301 2.576858 2.538683 
2 El Infiernillo Vo/ 3 2463.009 0.517784 2.692160 
3 Huites Qp 27 | 1488.418 | 1.888118 | 2.045599 
4 Huites Vol 3 923.356 0.553249 | 3.152180 
5 Guamuchil Qp 37 | 304.088 2.013669 | 1.384388 
6 Guamuchil Vo/ 4 43.366 0.397784 1.791456 
7 Malpaso Qp 500 | 701.759 8.699795 | 2.733231 
8 Malpaso Vol 2 1936.030 0.461082 3.178971 
9 Los Filtros PMD 1 42.399 0.862399 | 5.390516 
10 Rio Verde PMD 1 57.438 0.800002 | 4.660408 
11 Tanquian PMD 200 | 50.479 7.188239 | 3.506981 
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Fitting of the BEP distribution 


An identical procedure to the one previous one was followed for the BEP 
fit, but now equations (16) to (18) were used for its fit parameters and 
expressions (12) and (20) for the estimated values 2, and thus be able to 


evaluate the fitting errors (equations (23) and (24)) and their predictions 


(Equation (12)). The results are shown in Table 4. 


Table 4. Number of iterations (7) and values of the fit parameters 
(6,a,0) of the BEP distribution in the 11 records processed. 


No. Record in: j B a (2) 

1 El Infiernillo Qp 70 |2919.052 | 0.360741 | 5.423902 
2 El Infiernillo Vo/ 2 |1572.835 | 1.001999 | 2.413874 
3 Huites Qp 3 |1920.472 | 0.762506 | 2.641226 
4 Huites Vol 216 | 803.068 | 1.317579 | 2.335993 
5 Guamuchil Qp 6 | 430.447 | 0.676817 | 1.973478 
6 Guamuchil Vol 4 15.870 0.983551 | 1.417081 
7 Malpaso Qp 9 |1408.943 | 0.513508 | 5.187900 
8 Malpaso Vol 150 | 1972.335 | 1.914897 | 1.882710 
9 Los Filtros PMD ¡l 42.774 1.178656 | 4.616619 
10 Rio Verde PMD 2 55.730 1.081484 | 4.527451 
11 Tanquian PMD 16 76.011 0.345564 | 8.324678 
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Selection strategy and results 


The general approach for the selection of the adopted predictions, in each 


processed record, was based on the minimum values of the EEA and EAM 


errors, as well as on the magnitudes of the extreme return period 
predictions (Tr) 1000, 5000 and 10000 years, to find representative 


estimates of the five obtained in each Tr (Table 5). 


Table 5. Contrast of goodness-of-fit indicators and predictions between 
the BEK and BEP distributions and the three in general use (LP3, GVE 
and LOG), in the 11 hydrological data records processed. 


uñabe | EER AÑ Return periods, in years 
25 50 100 500 1000 5000 | 10000 
1 BEK | 1575 | 622.3 | 13364 | 17659 | 23267 | 43955 57770 | 108934 | 143049 
1 BEP | 1072 | 436.1 | 15125 | 21556 | 30719 | 69927 | 99654 | 226845 | 323281 
1 LP3 1078 | 458.2 | 15257 | 20645 | 27643 | 53011 69619 | 129550 | 168642 
1 GVE | 1727 | 652.3 | 14499 | 19961 | 27423 57155 78386 | 163182 | 223718 
1 LOG | 1829 | 707.3 | 14095 | 19509 | 27104 | 58949 | 82743 | 182964 | 257909 
2 BEK | 416.0 | 276.8 | 6235 | 8158 — 10612 | 19379 | 25084 | 45626 59015 
2 BEP | 552.8 | 352.7 | 5851 7861 10514 | 20521 27343 53207 70868 
2 LP3 | 468.2 | 233.3 | 7055 | 9394 12240 | 21356 | 26650 | 43240 52705 
2 |GVE | 407.2 | 257.2 | 6542 | 8552 | 11017 | 19156 | 24074 | 40360 50200 
2 |LOG | 460.5 | 299.0 | 6373 | 8456 | 11138 | 20835 | 27199 | 50330 65532 
3 BEK | 990.8 | 482.7 | 9648 | 13644 | 19219 | 42339 | 59437 | 130555 | 183116 
3 | BEP | 1023.0| 505.6 | 9443 | 13367 | 18884 | 42024 | 59292 | 131850 | 186018 
3 LP3 | 938.4 | 411.3 | 10856 | 15381 | 21438 | 44405 59970 | 117987 | 156804 
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2 
A 


Return periods, in years 
PDF EEA EAM 


25 50 100 500 1000 5000 | 10000 


GVE | 974.8 | 453.0 | 9994 | 14008 | 19464 | 41019 | 56297 | 116801 | 159676 


LOG | 1056.0 | 492,8 | 9696 | 13680 | 19246 | 42424 | 59637 | 131640 | 185161 
BEK | 88.3 61.1 2086 | 2624 3284 5491 6845 11409 14211 


BEP | 94,3 59.8 | 2198 | 2799 3539 6025 7559 12772 16003 


LP3 64.5 45.4 | 2109 | 2501 2899 3854 4278 5287 5732 


GVE | 86.0 58.1 | 2155 | 2672 3265 5005 5949 8732 10242 


LOG | 101.3 | 70.8 | 2111 | 2670 3354 5622 6999 11593 14387 


BEK | 1552 | 620.8 | 5044 | 8416 | 13961 | 44842 | 74022 | 236881 390495 


BEP | 1639 | 640.5 | 4771 | 8039 | 13522 | 45140 | 75850 | 253085 | 425248 
LP3 | 1152 | 493.1 | 7779 | 14739 | 27483 | 112419 | 204067 | 803624 | 1444814 


GVE | 1693 | 716.1 | 5864 | 9679 | 15832 | 48841 | 79098 | 241647 | 390542 


LOG | 1820 | 729.2 | 5422 | 8962 | 14714 | 46057 | 75171 | 234269 | 382083 


BEK | 18.1 11.5 150 226 336 831 1224 3009 4429 


BEP | 23.3 12.3 156 259 429 1369 2253 7152 11761 


LP3 16.6 11.2 199 346 584 1833 2933 8432 13126 


GVE | 22.3 13.4 148 224 334 825 1208 2913 4246 
LOG | 23.7 13.9 142 217 326 829 1235 3103 4611 


BEK | 213.8 | 106.4 | 4986 | 6453 8332 15039 | 19381 | 34957 45005 


BEP | 285.4 | 152.3 | 4715 | 6118 7936 14521 18837 | 34465 44707 


LP3 | 185.3 | 93.6 | 5127 | 6550 8294 14040 | 17498 | 28887 35735 


GVE | 315.6 | 139.0 | 5013 | 6442 8243 14475 | 18403 | 32031 40620 


LOG | 346.7 | 160.2 | 4896 | 6357 8279 15478 | 20356 | 38726 51186 


BEK | 279.9 | 163.8 | 4093 | 5143 6428 10708 | 13324 | 22114 27498 
BEP | 280.2 | 151.4 | 4317 — 5423 6728 10826 | 13206 | 20810 25270 


LP3 | 277.7 | 141.2 | 4482 | 5573 6767 9968 11545 15709 17733 


GVE | 226.4 | 140.3 | 4253 | 5265 6406 9676 11407 | 16383 19019 


00| C0| CAY A| AM| JJ NJ NY NY SY Odd Odd OA A Aj U1| UA UI UA] UA Al Aj Aj A| Al UW| U 


LOG | 262.4 | 165.7 | 4168 | 5272 6610 10965 | 13567 | 22086 27182 
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Neo DEBA EA Return periods, in years 

25 50 100 500 1000 | 5000 10 000 

9 BEK 2.5 1.6 74 85 97 131 149 200 228 
9 BEP 255 5 76 87 99 134 152 205 232 
9 LP3 2.9 1.7 74 82 90 107 114 131 139 
9 GVE 3.7 1.5 74 81 88 104 110 123 129 
9 LOG 3.3 1.5 74 83 93 121 135 172 191 
10 | BEK 4.4 2.9 108 126 147 208 241 341 395 
10 | BEP 4.1 3.2 106 123 142 198 228 317 366 
10 | LP3 3.4 2.2 110 126 142 184 204 255 279 
10 | GVE 3.0 22 110 125 141 181 199 244 265 
10 | LOG 3.8 2.9 109 127 147 208 241 339 392 
11 | BEK 13.0 7.2 220 269 329 521 635 1005 1224 
11 | BEP 12.2 6.4 233 296 377 659 839 1468 1868 
11 | LP3 11.9 6.7 231 283 344 535 644 985 1180 
11 | GVE 15.6 7.6 228 280 344 551 673 1071 1307 
11 | LOG 16.3 7.9 223 278 348 598 759 1337 1712 


NR = registration number, according to Tables 3 or 4. 


PDF = probability distribution function tested. 


EEA = standard error of fit, in m3/s, Mm? ó mm, according to data. 


EAM = mean absolute error, in m3/s, Mm3ó mm, according to data. 


Exclusively for record 3 (Huites Qp), the LP3 distribution led to the 


lowest fit errors and also to the predictions adopted, given the 


extraordinary similarity that all the estimates showed. 


It is important to highlight that the LP3 distribution led to the lowest 


fit errors in records 4, 5, 6 and 7. In the first and last, such distribution 


was not selected because it led to very low predictions; on the contrary, 
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in registers 5 and 6. For these four registers, their CEs were: 1,015, 
1,831, 1,619 and 1,226; evidently, in registers 5 and 6 the method of fit 
by moments in the logarithmic domain is not applicable. For the rest of 


the processed records, the CE was not less than 1.00. 


In record 1, the BEP distribution reported the lowest fitting errors, 
but all its predictions are considered high. Due to the above, the GVE 
distribution was adopted, as it has much lower fit error values than the 
LOG model. In record 2, the LP3 and GVE distributions led to the lowest 
fitting errors; the former became adopted due to its more severe 


predictions. 


In record 4, the GVE distribution provided the following minimum 
fitting error values, but its predictions were reduced and, therefore, the 
BEK distribution was adopted with low errors and representative 


predictions, even similar to those of the LOG. 


In records 5, 6 and 7, the adopted BEK distribution reported the 
second lowest fitting error values and its predictions are accepted as 
representative, due to the great similarity they show with those of the 
GVE and LOG models. 

In records 8, 9 and 10, the GVE, BEP and GVE distributions were 
adopted, which reported the minimum fitting errors. Regarding its 
predictions, those of the GVE can be considered slightly scarce and those 
of the BEP somewhat high, when compared to those of the LOG model. 

In record 11, the LP3 and BEP distributions led to the smallest fitting 


errors; the latter is adopted because of its more severe predictions. 
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Figure 1 and Figure 2 show the Q-Q diagnostic graphs of the two 
best fits achieved with the BEK distributions in register 7 and BEP in 


register 9. 


Qp anual estimado (0 m/s) 


Qp anual observado (0? m/s) 


Figure 1. Q-0Q graph of record 7 of annual Qp in the Malpaso dam 
obtained with the BEK distribution. 
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Figure 2. Q-Q graph of record 9 of annual PMD at Los Filtros station 
obtained with the BEP distribution. 


In Figure 1, up to data number 38, the BEK model reproduces the 
sample data exactly and from there, it underestimates the following six 
values and data 46; finally, it slightly overestimates the 45th and last. In 
contrast, in Figure 2, only a lack of accuracy is detected in the last six 


data, but not severe. 
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Conclusions 


The contrasted BEK and BEP distributions have the following two 
advantages: (1) they show great versatility to represent records of 
extreme hydrological events due to their dense right tail and the use of 
two shape parameters and (2) their maximum likelihood fitting method it 
is efficient, simple and computationally uncomplicated. Due to the above 
mentioned reasons, its routine inclusion in the frequency of extreme 


hydrological events analysis is recommended. 


In the eleven records processed, the predictions of the first three 
return periods (Tr < 100 years), are quite similar. In general, the PDF 
adopted for having a lower EEA and EAM value, leads to representative 
predictions in the last four high return periods (Tr > 500 years). The two 
previous conditions give rise to confidence in all the calculated and 


adopted predictions. 


The observations deduced from Table 5 of results (errors and 
predictions), allow us to suggest the routinely application of the BEK and 
BEP distributions to complement those of of general application (LP3, GVE 
and LOG); especially for selecting the predictions to be adopted in the 
three extreme return periods of 1,000, 5,000 and 10,000 years. The later 
was verified with the BEK distribution that was adopted in two Qp and two 
Vol registers (from 4 to 7) of the eight processed ones and the BEP model 


was adopted in two PMD registers (9 and 11) of the three analyzed. 


It should be noted that these Conclusions are based on the results 


of the 11 records processed and therefore, they may give the impression 
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that the BEK and BEP distributions are better than those of general 
application, but this is not the case; rather, only practical and feasible 
options should be considered for extreme hydrological data frequency 
analyses. 
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